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Abstract 

In this paper the problems of deriving a taxonomy 
from a text and concept-oriented text segmentation are 
approached. Formal Concept Analysis (FCA) method 
is applied to solve both of these linguistic problems. 
The proposed segmentation method offers a conceptual 
view for text segmentation, using a context-driven clus- 
tering of sentences. The Concept-oriented Clustering 
Segmentation algorithm ( COCS) is based on k-means 
linear clustering of the sentences. Experimental results 
obtained using COCS algorithm are presented. 

1. Introduction 

Formal Concept Analysis (FCA) studies how objects 
can be hierarchically grouped together when their com- 
mon attributes are studied in a given context. Linguists 
often characterize datasets using distinct features, such 
as semantic components or syntactical and grammat- 
ical markers, which can easily be interpreted using 
FCA. However, linguists argue that formal concepts 
are quite different from cognitive processes relating to 
natural language [13|. This is why current FCA appli- 
cations in linguistics focus more on formal structures 
than on cognitive linguistic phenomena. 

Eventually, in the linguistic domain FCA applica- 
tions provide a very suitable alternative to statistical 
methods. 

In this paper we address the problem of deriving a 
taxonomy from a text for text segmentation by concept- 
driven clustering. This conceptual view of segmenta- 
tion is useful when different users have quite different 
needs with regard to way of segmentation. 

The needed knowledge in our Concept-oriented 
Clustering Segmentation algorithm COCS is only the 
taxonomy derived from text. It is used the k-means 
algorithm for a linear clustering of the sentences. 

The paper is structured as follows: Section 2 intro- 
duces the basic notions of ontologies and FCA. Section 



3 surveys the related work in taxonomies extraction 
from a text and in text segmentation. Section 4 in- 
troduces CUTE (concept lattice-taxonomy extraction) 
algorithm and COCS algorithm for text segmentation. 
In Section 5 experimental results obtained using COCS 
algorithm are presented. We finish the paper with 
conclusions and future work directions in Section 6. 

2. Abstract Ontologies and FCA 

Following [6 1, an ontology is a formal specification 
of a shared conceptualization of a domain of interest 
to a group of users. Formal implies that the ontology 
should be machine readable, and shared implies it is 
accepted by a group or community. 

Definition 1. An abstract ontology O is a model 
represented by: 

O = (C, H, R, A) 

where: 

• C is a set of concepts (concept identifiers); 

• H is a taxonomic relation ( IS-A) between con- 
cepts, H C C x C , that means it is a partial and 
transitive order on C; 

• R is a set of non-taxonomic relations, R C C x C; 

• A is a set of logical axioms (or inference rules). 
Mostly approaches focus on the first two elements of 

an ontology C and H, which form the "core ontology" 
while the researches on the sets R and A are least 
addressed. 

The above definition doesn't make a distinction be- 
tween a concept and its lexical expression. Completing 
O with a lexicon could be addressed the problems 
of synonymy (a set of lexical expressions represents 
the same concept) and that of the polysemy (a lexical 
expression represents a set of concepts). 

In the particular case of learning a taxonomy from 
a text we will present the method used by J4) and our 
proposed version. 



2.1. A short survey of Formal Concept Anal- 
ysis (FCA) 

FCA has been introduced by B. Ganter and R. 
Wille in 1982 (for a textbook see Q). During the 
last years, FCA has grown into an international re- 
search community with applications in many different 
domains as artificial intelligence, linguistics, software 
engineering, medicine, etc.. Formal concepts in FCA 
can be seen as a mathematical formalization of what 
has been called the theory of concepts, which states 
that a concept is formally defined via its features lfl3l . 
From a philosophical point of view, a concept is a 
unit consisting from two parts: the extension (the set 
of objects belonging to this concept) and the intension 
(the set of attributes valid for all these objects). The 
frame for defining a set of concepts is the so called 
Formal Context . 

Definition 2. A Formal Context is a triple: 

K = (G, M, I) 

where G is the set of objects, M is the set of 
attributes, and / is a binary relation between G and 
M (I C G x M), representing the incidence relation. 
The pair (g,m) € I is read as "the object g has the 
attribute to". 

Usually a Formal Context is given by an incidence 
matrix, where a star "*" on the line of g and the column 
of to means that the object g has the attribute to. 

For a set A C G, the set of all attributes shared by 
the objects from A, called the "derivative" of A and 
denoted by A' is defined as: 

A' = {m€M| V.g e A, (g, m) e 1} 

Dually, for a set B C M, the set of all objects which 
share the attributes from B, called the "derivative" of 
B and denoted by B' is defined as: 

B' = {g e G | Vto e B, (g, to) e /} 

Definition 3. A Formal Concept of the Formal 
Context K = (G, M, I) is a pair (A, B), with A<ZG, 
B C M and satisfying the relations: 

A' = B and B' = A 

The set A is called the extent of the Formal Concept 
(A, B) and the set B is called the intent of the same 
Formal Concept. 

Between the Formal Concepts the relation < of 
subconcept-superconcept is defined as below: 

(Ai,Bi) < {A 2 ,B 2 ) if and only if A x C A 2 



or equivalently 

{A 1 ,B 1 ) < {A 2l B 2 ) if and only if B 2 C B x . 

The set of all Formal Concepts of a Formal Context, 
K, together with the order relation < forms o complete 
lattice called the Concept lattice, and denoted B(K). 
The top (the last element of the Concept lattice) is 
1b(k) an d the bottom (the first element of the Concept 
lattice) is Ob(k)- 

Each node X of the lattice is characterized by a 
set of objects A and a set of attribute B. The set A 
is formed by all the objects situated on paths which 
begin with X (including X) and end on the bottom 
of lattice, and the set B is formed by all the attributes 
situated on paths which begin with top and end on X 
(including X). Moreover, A' = B and B' = A and 
thus the node labeled by the pair (A, B) represents a 
Formal Concept. 

Remarks: 1. Each object and attribute is introduced 
at a single node. 2. The objects situated lower (higher) 
in the lattice have more (less) attributes. 3. The at- 
tributes situated lower (higher) in the lattice are shared 
by less (more) objects. 

Rules for simplifying the Concept lattice are applied 
when they are not clarified and have the objects and 
the attributes reducible: 

Definition 4. A Concept lattice is clarified if no two 
of its objects have the equal intents, and no two of its 
attribute have the equal extents. These properties could 
be observed from the incidence matrix of the Formal 
Context. 

Definition 5. An attribute to of a clarified Formal 
Context is reducible if there is a set S C M of at- 
tributes such that {to}' = S, otherwise it is irreducible. 
Reducible objects are defined dually. 

Remark: If to is reducible, it can be deleted from 
the Formal Context (dually for a reducible object). 

Reading from this Concept lattice the labels which 
introduce attributes and transforming the obtained 
lattice in tree such that all the inheritances between 
attributes are kept, a taxonomic hierarchy is obtained 
(see Sections 3 and 4). 

3. Related work 

3.1. Automatic learning of a taxonomy from 
a text: Cimiano's approach 

The most well known work in automatic learning 
of a taxonomy from a text is given by the Karlsruhe's 
team |4],(5]. Let us present the example introduced in 
|4| for obtaining a taxonomy from a text on the tourism 
domain using FCA. 
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Table 1 . The incidence matrix for tourism example 



Extent of concept 


Intent of concept 


Concept 


{apartment, car, 
motor — bike, trip, 
excursion, hotel} 


{bookable} 


C*i 


{apartment, car, 
motor — bike} 


{bookable, rentable} 


C 2 


{car, motor — bike} 


{bookable, rentable, 
driveable} 


Cz 


{motor — bike} 


{bookable, rentable, 
driveable, rideable} 


Ca 


{excursion, trip} 


{bookable, joinable} 


C 5 


<S> 


{bookable, rentable, 
driveable, rideable, 
joinable} 


C* 6 



Table 2. The Formal Concepts for tourism example 



The Formal Context is obtained selecting as M the 
set of transitive verbs from a text and as G the set of 
nouns playing the role of (direct) complement for the 
verbs from M. 

For the selected domain: 

M={bookable, rentable, driveable, rideable, joinable}, 
G={apartment, car, excursion, motor-bike, trip, hotel} 

and the relation I is given by the incidence matrix 

(Table 1). 

According to the method for obtaining the Concept 
Lattice (|6|), the set of all Formal Concepts are rep- 
resented in Table 2. Applying the definition of the 
subconcept relation, the following Concept lattice, is 
obtained: 



CI bookable/ (hotel) 

/ \ 
/ \ 

\ 

05 joinable/ 
/ ( {excursion, trip} ) 



/ 

rentable/ 02 
(apartment) | 
I 

driveable/ C3 
(car) | 
I 

rideable/ 04 / 
(motor-bike) \/ 
06 



/ 



/ 



/ 



Let us remark that the lattice is not clarified because 
the set of objects:{ excursion, trip} have the same 
intent: {bookable, joinable}. This is the reason in 
the node C5 the "object" label is formed by the set 
{bookable, joinable}. 

Example 1. Consider the concept Ci=(Ai,B 1 ) 
from the previous lattice. Here the extent A\ is 
formed by all the objects situated on paths starting 
with C\. A\ = G = {apartment, car, motor — 
bike, excursion, trip, hotel}. The intent is B\ = 
{bookable}. The relations A\ = B\ and B[ = A\ 
are verified. 

For the concept C2={A2,B2) the extent is 
A2={apartment, car, motor — bike} and the intent 
is B2 — {bookable, rentable}. Again, the relations 
A' 2 = B2 and B' 2 = A2 are verified. The Concept 
lattice relation C2 < C\ is valid, because A 2 C Ai 
(and, equivalently, B\ C B2 ). 

From the Concept lattice of the tourism example the 
following taxonomy is obtained H : 



bookable 



/ 

/ 

joinable 

/I 
/ I 
excursion 

I 
I 

trip 



\ 

\ 

rentable 
/ \ 
/ \ 
apartament \ 

driveable 
A 
/ \ 
car rideable 



hotel 



motor-bike 



Remark: In this kind of taxonomy the name of 
verbs could be replaced by the name of corresponding 
noun: for example joinable could be replaced by join 
or driveable by vehicle to improve the readability of 
the taxonomy. 

As we already have mentioned above, in |4| the 
Formal Context is obtained selecting as M the set of 
transitive verbs from a text and as G the set of nouns 
playing the role of (direct) complement for the verbs 
from M (subcategorized by the transitive verbs in M). 
It is possible to obtain pairs of object/attribute which 
are in a false position of complement/verb and to lose 
other pairs, when the corpus is not large enough. To 
improve this probability Cimiano clustered the nouns 
and the verbs using a vectorial model and finally he 
considered clusters of nouns as objects and clusters of 
verbs as attributes, instead of nouns and verbs. 

To obtain the vectors he considered the condi- 



tional probability P(n \ v), where P(a \ b) = 
y7gy ■ Here f(n,v) represents the frequency of oc- 
currences of the noun n as a complement of the 
verb v. An improved value of P(n \ v) is obtained 
by realizing before a noun and verb clustering 0. 
For this goal he calculated for each noun n and 
verb v the vectors: V n = (P(n | vi),--- ,P(n \ 
vi)) and V v = (P(ni | v),---,P(n k \ v)) and 
defined the similarity between nouns and between 
verbs as: sim(7i 1 ,7i 2 ) = cosine(V(n 1 ),V(n2)) and 
sim{v\, V2) — cosine(V(vi), V(i>2)). 

At each step he recalculated all P(n\v) where the 
clustered nouns n are considered together, and then the 
clustered verbs v are considered together. He alternated 
noun clustering and verb clustering until P(n\v) is 
above some threshold. The obtained clusters of nouns 
and verbs represent objects and attribute, respectively. 
The incidence relation between n and v means the 
occurrence of an element from the cluster of n as a 
complement of an element of the cluster of v. 

Cimiano also proposed (0) relation: verbs (as ob- 
jects) and nouns-subject (as attributes) and showed that 
using both these dependencies leads to better results. 

3.2. Related works in Segmentation 

A discourse segment consists of a sequence of sen- 
tences that display local coherence. Text segmentation 
is the automatic identification of boundaries between 
segments. The need for discourse segmentation derives 
from its applicability in many fields as for example: 

• Information Retrieval (IR). Many authors, like fSl 
and fl5l . showed that segmenting into distinct 
topics is useful as IR needs to find relevant 
portions of text that match with a given query; 

• Anaphora resolution (AR). Mining the text only 
in some segments for finding the antecedents for 
some referential expressions could improve the 
quality of AR (ifHl); 

• Text summarization. Segmentation as a pre- 
processing step in automatic summarization (as 
in this paper) could improve the quality of sum- 
maries 0. 

While the need for segmentation of discourse is 
almost universally agreed upon, there is no consensus 
on how the segmentation could be accomplished |Q]. 
However, a classification of the main directions of 
segmentation is as follows: 

• Topical text segmentation relies on finding the 
sentences that will be borderlines (topic's shifts) 
in the discourse. The applied method is usually 
the calculation of similarity which measures prox- 



imity between sequences of sentences or clauses 

(0); 

• Lexical chains segmentation methods rely on 
lexical chains which display the cohesion that 
arises from semantic relationships between words, 
relationships derived from WordNet or Roget's 
Thesaurus (0, ifTTl ): 

• Referential analysis segmentation methods act in 
the way that if a referring expression is used that 
requires an antecedent situated in a previous sen- 
tence, then all sentences between the antecedent 
and referring expression are considered to be in 
the same segment ( ifTZll ): 

• Earlier discourse segmentation methods are 
Rhetorical Structure Theory ( ifTUl ) or Hobbs's 
coherence relations [9| based on cue phrases (for 
example anyway is an end of a digression in 
attentional stack method [ 1 1 and because is a 
causal relation in RST theory) and on a large 
taxonomy of different relations that can hold 
between sentences and segments. 

Another classification of segmentation methods re- 
lies on the structure type of the output. In linear 
segmentation the discourse is divided into a linear 
sequence of adjacent segments ( [8] or this paper) 
while in hierarchical segmentation there are hierar- 
chically organized sets of segments, as for example 
attentional/intentional structures of Grosz and Sidner 
(0), rhetorical trees in RST theory (|10|) or atten- 
tional stacks in [1]. Recently a new method of linear 
segmentation has been proposed in lfl6l which uses 
a kind of complementing set of formal concepts in 
concept lattice of a given formal context. 

A final classification of segmentation methods is into 
cohesion based methods (as for example lexical chains) 
and coherence based methods (as in RST theory and 
Hobbs's coherence relations theory). 

4. This paper proposal 

4.1. Obtaining the Concept Lattice and the 
Concept Hierarchy from a text 

FCA is used to build the Concept Lattice and 
then to extract the Concept Hierarchy from a text 
using as attributes the transitive verbs and as objects 
the corresponding nouns with the role of direct 
complements from the studied corpus. We propose 
Concept Lattice - Taxonomy Extraction (CLTE) 
algorithm which introduces specific rules for deriving 
the taxonomy as a quasi-tree from the Concept Lattice. 



Concept Lattice - Taxonomy Extraction algo- 
rithm (CLTE): 

Input: Text - a text document. 

Output: K-the formal context, L-the concept lattice, 

T- the taxonomy based on the concept lattice. 
Stepl: Text-Pos = Pos-taggingfTexf). 
Step2: Pairs = {(verb, noun-direct-complement)}; 

= extract-pairs(Tex?-Pc«). 
Step3: Pairs -lemma=lemmatize-veibs-nonns(Pairs). 
Step4: M = fiequent-\eibs{Pairs-lemma); 

G = frequent-nouns(Pfl/rs-/emma). 
Step5: Build the formal context: K = (G, M, I) 

where (n, v) £ I, if (v, n) £ Pairs-lemma. 
Step6: Build the concept lattice L=B(K)). 
Stepl: Build the taxonomy T, represented as a 

quasi-tree, based on the concept lattice L. 
Remarks: 

• The POS annotation is enough and no parsing 
is needed for the initial text corpus. Rules for 
determining the dependency verb - noun as a 
direct complement must be used. 

• Generally the taxonomy, derived from a concept 
lattice, cannot be represented as a tree like in 
Cimiano's example, but using a special data struc- 
ture, called a quasi-tree (a node may have more 
parents and two internal nodes may have the same 
label), T — (X, E), with the following properties: 

- X = G\JM and E, the set of edges, is 
obtained from de subconcept relation of the 
Concept lattice according to special rules. 

- The most general concept (the top of the 
lattice) is the root of the quasi-tree. 

- The leaves of the quasi-tree T are labeled 
with nouns (objects) from G and the internal 
nodes are labeled with verbs (attributes) from 
M. 

- Let C°> a -> C°'- a ' be an edge in the Concept 
lattice, where the node C°' a introduces the 
object o and the attribute a. There are 16 
cases (a, o, a', d can be equal or not equal 
with 0), some of them impossible cases. The 
most used rules for adding nodes and edges 
in the taxonomy, represented as a quasi-tree, 
are the following: 

* if a ^ 0, o = 0, a' ^ then (a, a') G E; 

* if a = 0, o ^ 0, a' = then (a, a') G E; 

* if a ^ 0, o ^ 0, a' = d = then (a, d) € 
E, o is a leaf node; 

* if a ^ 0, o ^ 0, a' ^ then (a, a') G E, 
(a, o) 6 E, o is a leaf node; 

* if a ^ 0, a' = 0, d ^ then (a, d) G E, 
d is a leaf node; 



* if a = 0,o = 0,a' = 0,o' j= then 
(a, o') G B, d is a leaf node; 

* if a ^ 0, o = a' = d = then (a, a) G E; 

• A path from the root to a leaf node provides a 
hierarchy regarding the concept terms (verbs and 
nouns) on that path. 

4.2. Concept-oriented segmentation by cluster- 
ing 

The process of segmentation is seen as an objective 
method, which provides one clearly defined result. 
However, different users have quite different needs 
with regard to a segmentation because they view the 
same text from completely different, subjective, per- 
spective. Segmenting a text must be associated with 
an explanation of why a given set of segments is 
produced. All these could be realized by viewing the 
process of segmentation as a clustering process of the 
sentences of a text [16|. 

When the cluster CI = {5 il5 • • • , S im } is 
one of the set of obtained clusters, and i\ < 
*2 • • • < im > then the linear segmentation is: 
[<Si j Siy— i] \Si x , Si 2 ] , ■ ■ ■ , [S'» m _ 1 j Si m ] , [Si m , S n ] . The 
concept terms which are "specific" to this cluster CI 
(concept terms specific to the center of cluster CI) 
explain the reason of the segmentation. 

Let us remark that usually clustering texts means 
selecting of the most important (by frequency) words 
(terms) as features of clustering (|5|). In our method 
we choose as words the transitive verbs and com- 
plement nouns which form the concepts in the FCA 
approach (|4|). In what follows we refer to these words 
(terms) as concept terms, namely concept attribute 
terms, M, and concept object terms, G. 

A sentence is represented as a vector of concept 
terms: an entry of each vector specifies the frequency 
that a concept term occurs in the text, including the 
frequency of subconcept terms. 

The following algorithm is an improvement of an 
own algorithm introduced in ITBI . 

Concept-oriented Clustering Segmentation algo- 
rithm COCS: 

Input: Text = {Si, ■ ■ ■ , S n } of n sentences, 
- the output of the CLTE algorithm: 
K- the formal context, L- the concept lattice; 
T- the taxonomy based on L. 
Output: Different segmentations of the text Text, 
according to different sets of concepts. 

• Stepl: Calculate the frequency f(i,t) of the 
concept term t G G U M in the sentence Si. 



• Step2: Calculate the total frequency 
of the concept term t in the sentence 

Si as Totalsiht) = f(i,t) + 

E* a direct descent of t in the taxonomy f{h *' 

• Step3: Calculate the total frequency of t for all 
sentences as Total(t) — 1 Total s(i,t). 

. Step4: Choose the first m = ||G U M\ best 
supported concept terms: t%, ■ ■ ■ , t m (which max- 
imize Total(t)). 

• Step4: Represent each sentence Si by a m- 
concept term vector. 

V(i) = (Total s {i,ti),- ■ ■ ,Total s (i,t m )). 

• Step5: Apply a linear clustering of the set 
of sentences Text = {Si,-- - ,S n }, us- 
ing K-means algorithm, where sim(Si,Sj) — 
cosine(V (i) , V(j)) 

A cluster corresponds to a segmentation as above. 
The concept terms specific for this cluster explain 
the "view" of segmentation and help the user 
to understand the differences between clustering 
(segmentation) results. 

The used clustering method is K-means which 
we survey below. 

K-means algorithm|ll|: 

Input: Text = {Si,-- - ,S n } of n sentences, the 
corresponding vectors {V(l),-- - ,V(n)} obtained at 
Step4 of COCS algorithm. 

Output: The set of clusters C = {Ci, C 2 , Ck} 

Begin 

Select _k initial c^ntroids: 

{A,f2,~,fk}c{V(l),--- ,V(n)} 
While the stopping criterion is not true Do 
For j=l to k Do 

Cj = {V(i)\V ft, d(V(i),f/) < d(V(t), f), 
d(V(i),V(j)) = cosme{ v (l) , vu)) } 

End-For 

For j=l to k Do 

f _ ^^ ec j 

JO- \Cj\ 

End-For 
End-While 
End-algorithm 

The K-means algorithm begins with a set of initial 
cluster centers, selected such that they are as least 
similar as possible. At each while-iteration, each vector 
is assigned to the cluster whose center is closest 
and then the centroids of the modified clusters are 
recomputed as a mean of its members. The distance 
between two vectors is computed as the inverse of 
the similarity of the vectors. The stopping criterion 
can be the condition that the diameters of all clusters 



are smaller than a threshold value or that there are no 
changes in C from the previous iteration. A diameter 
of a cluster is the distance between the least similar 
elements in the cluster. 

5. Experimental results 

The algorithms proposed in the previous sections 
were implemented and tested on texts from different 
domains as art, music, law. 

Considerations for implementation: 

• For POS-tagging and lemmatization of verbs and 
nouns we have used Online CST tools which 
incorporate a tokenizer, name recognizer, Brill- 
POS-tagger (an error-driven transformation-based 
tagger), lemmatiser, NP recognizer and other tools 
(http://conexp.sourceforge.net/index.html). 

• The pairs (transitive verb, noun as a direct com- 
plement) were obtained using our specific rules 
for determining this type of dependency. 

• The most frequent verbs and nouns were choosed 
such that they appear twice in the set of selected 
pairs. 

• The construction from the concept lattice of the 
quasi-tree representing the taxonomy of the con- 
cept terms is based on the rules proposed in 
Subsection 4.1. 

• The implementation of C<9CS-algorithm follows 
the described above steps. 

As experimental results we describe an example 
of a text, consisting of 320 sentences, from the law 
domain. An extract of 30 sentences occurs in the 
Figure 2. The Concept lattice is computed with the 
CUTE algorithm and visualized in Figure 1, using 
ConExp. This is a software tool aimed for handling 
the tasks involved in the study of lattice theory, mainly 
formal concepts. (More information is available at 
http : //conexp . sourc eforge . net/index . html ) 

The taxonomy is too complex to be depicted, but 
we present some paths in the corresponding quasi-tree 
representing hierarchies of concept terms: 

• inform — > support — > progress 

• continue — > represent — > tradition 

• have — > influence — > law — system 

• codify — > make — ¥ law 

• develop — ¥ reject — > principle 

The COCS algorithm was applied only to the first 
102 sentences of the initial text. At Step2, the fre- 
quency of a concept term t in a sentence is obtained 
as the sum of its own frequency and the frequencies 
of the direct descendents of t in the taxonomy. For 
example: Total <j(14, concern) = f (14, concern) + 
/(14, justice) + f (14, system) = 1 + 1 + = 2, 



Figure 1 . Concept Lattice for a corpus in the law domain 



There are 21 terms (rpresenting the value of m in 
Step4 of COCS algorithm): {concern, have, kill, law, own, 
offenders, include, write, boy, condone, preserve, eat, hold, 
do, create, make, govern, provide, buy, shape, jewel} used 
as features for clustering. After the clustering process 
4 clusters were obtained. 

The cluster Cl={5 8 , 5ig, 5 2 7, S31, S37, ^o, 5 60 , 5*63} 
is characterized by the concept terms: {have, offenders, 
write, condone, do, govern}, meaning that these terms 
appear in the sentences of the cluster. 
The corresponding linear segmentation of the text is: 

["51,57], [5g,5i8], [5iQ,526], [527,53o], [531,536], 
[537,539,], [540 , 559], [560,562], [563,5i02]- 

The cluster C2={5 3 , S 14 , 5 20 , 5 53 , 5 54 , 5 68 , S71, 
574, 5g4} is characterized by the concept terms: 
{concern, preserve, buy, shape, jewel} and provides 
the segmentation: [5i,5 2 ], [5 3 ,5i 3 ], [5i 4 ,5i 9 ], 

[520 , 552], [553,553,], [554 , 567], [568, 57o], 
[571,573], [574,5g3], [5s4,5iQ2]- 



6. Conclusions and further work 

In this paper we applied the FCA theory to obtain a 
taxonomy (algorithm CUTE) for concept-oriented seg- 
mentation of a text. The COCS algorithm introduced in 
this paper approaches the process of segmentation as a 
clustering process of the sentences of a text, using the 
taxonomy learned from a text. Each cluster provides a 
segmentation, explained by the concept terms specific 
for this cluster. 

As further work we propose to improve the tax- 
onomy learned from a text considering also as pairs 
of attribute-object: (verb at the passive, corresponding 
noun with the role of subject). More experiments with 
texts from different domains are needed in order to 
evaluate our approach. 
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Text: 

l.Law is a system of rules, usually enforced through a set of 
institutions. 2.1t shapes politics, economics and society in numerous 
ways and serves as a primary social mediator of relations between 
people. 3.Contract law regulates everything from buying a bus ticket 
to trading on derivatives markets. 4.Property law defines rights and 
obligations related to the transfer and title of personal (often referred 
to as chattel) and real property. 5. Trust law applies to assets held 
for investment and financial security, while tort law allows claims 
for compensation if a person's rights or property are harmed. 6.1f 
the harm is criminalized in a statute, criminal law offers means by 
which the state can prosecute the perpetrator. 7.Constitutional law 
provides a framework for the creation of law, the protection of human 
rights and the election of political representatives. 8.Administrative 
law is used to review the decisions of government agencies, while 
international law governs affairs between sovereign nation states 
in activities ranging from trade to environmental regulation or 
military action. 9.Writing in 350 BC, the Greek philosopher Aristotle 
declared, "The rule of law is better than the rule of any individual." 
lO.Legal systems elaborate rights and responsibilities in a variety 
of ways. 11. A general distinction can be made between civil law 
jurisdictions, which codify their laws, and common law systems, 
where judge made law is not consolidated. 12.1n some countries, 
religion informs the law. 13. Law provides a rich source of scholarly 
inquiry, into legal history, philosophy, economic analysis or sociol- 
ogy. 14.Law also raises important and complex issues concerning 
equality, fairness and justice. 15."In its majestic equality", said the 
author Anatole France in 1894, "the law forbids rich and poor alike 
to sleep under bridges, beg in the streets and steal loaves of bread." 
16.1n a typical democracy, the central institutions for interpreting 
and creating law are the three main branches of government, namely 
an impartial judiciary, a democratic legislature, and an accountable 
executive. 17.To implement and enforce the law and provide services 
to the public, a government's bureaucracy, the military and police 
are vital. 18.While all these organs of the state are creatures created 
and bound by law, an independent legal profession and a vibrant 
civil society inform and support their progress. 19. Constitutional and 
administrative law govern the affairs of the state. 20.Constitutional 
law concerns both the relationships between the executive, legislature 
and judiciary and the human rights or civil liberties of individuals 
against the state. 21.Most jurisdictions, like the United States and 
France, have a single codified constitution, with a Bill of Rights. 
22. A few, like the United Kingdom, have no such document. 23. A 
"constitution" is simply those laws which constitute the body politic, 
from statute, case law and convention. 24.A case named Entick v 
Carrington illustrates a constitutional principle deriving from the 
common law. 25.Mr Entick's house was searched and ransacked by 
Sheriff Carrington. 26.When Mr Entick complained in court, Sheriff 
Carrington argued that a warrant from a Government minister, the 
Earl of Halifax, was valid authority. 27. However, there was no 
written statutory provision or court authority. 28.The great end, for 
which men entered into society, was to secure their property. 29 .That 
right is preserved sacred and incommunicable in all instances, where 
it has not been taken away or abridged by some public law for the 
good of the whole ... 30.1f no excuse can be found or produced, the 
silence of the books is an authority against the defendant, and the 
plaintiff must have judgment. 
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